Large lexicon construction for TTS system

نویسندگان

  • Ben-Feng CHEN
  • Guo-Ping HU
  • Ren-Hua WANG
چکیده

Lexicon is an essential part of Chinese Information Processing. In particular, compared with the basic lexicon, a large and perfect lexicon can effectively reduce the complexity and improve the precision of text parsing in TTS System. However, this special lexicon is hard to be constructed by either handwork or computer. This paper presents an approach to construct a large lexicon combining computer assistance and handwork, including the lexicon-iteration method of generating a large lexicon, and the lexicon-words selection that helps to improve the system. Based on this approach, we have constructed a large lexicon containing vocabularies about 200,000. And the experiments show that this large lexicon improves the efficiency of our system by 22.9% and the precision of word segmentation result by 19.0%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme models

Modern Text-To-Speech (TTS) systems need to increasingly deal with multilingual input. Navigation, social and news are all domains with a large proportion of foreign words. However, when typical monolingual TTS voices are used, the synthesis quality on such input is markedly lower. This is because traditional TTS derives pronunciations from a lexicon or a Grapheme-To-Phoneme (G2P) model which w...

متن کامل

Efficient Development of Lexical Language Resources and their Representation

Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexico...

متن کامل

Hughes Trainable Text Skimmer: description of the TTS system as used for MUC-3

TTS-MUC3 incorporates semi-automated lexicon generation and almost fully automated phras e pattern generation. Associative retrieval from a case memory provides raw data for computing se t fills and string fills . TTS-MUC3's modular process model integrates the results of case memor y retrieval over sentences from multiple stories, extracts the date and location of incidents, an d computes cros...

متن کامل

Duration modeling and memory optimization in a Mandarin TTS system

Current speech synthesis efforts, both in research and in applications, are dominated by methods based on concatenation of spoken units. New progress in the concatenative text-to-speech (TTS) technology can be made mainly from two directions, either by reducing the memory footprint to integrate the system into embedded system, or by improving the synthesized speech quality in terms of intelligi...

متن کامل

On Mirandese language resources for text-to-speech

This paper aims at describing the major components of the first Text-to-Speech (TTS) system ever built for Mirandese, [1] a minority language spoken in the Northeast of Portugal. Both language resources development (corpus, textnormalization rules, annotated lexicon, phone sets and recordings) and the TTS (Statistical Parameter Synthesis) system are documented here.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002